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ABSTRACT. This paper considers the approximate reconstruction of points, x £ ]R D , which are close to a given compact 
d-dimensional submanifold, M, of ]R D using a small number of linear measurements of x. In particular, it is shown that a 
number of measurements of x which is independent of the extrinsic dimension D suffices for highly accurate reconstruction 
of a given x with high probability. Furthermore, it is also proven that all vectors, x, which are sufficiently close to M can be 
reconstructed with uniform approximation guarantees when the number of linear measurements of x depends logarithmically 
on D. Finally, the proofs of these facts are constructive: A practical algorithm for manifold-based signal recovery is presented 
in the process of proving the two main results mentioned above. 



1. Introduction 

In this paper we present a simple reconstruction technique which facilitates compressive sensing for general classes 
of high-dimensional signals with low intrinsic dimension. Two types of models are often considered: sparse models 
and low-dimensional/manifold models. The former type of model assumes that each data point has a sparse represen- 
tation in terms of a (typically known) dictionary O, which geometrically means that data points lie on unions of a small 
number of planes spanned by the elements of the dictionary ||25l . The latter type of model assumes that data possesses 
an intrinsically low-dimensional geometrical structure, for example that of a manifold (see e.g. 11431 171 12011241 . among 
many others) or a union of planes (see e.g. ll45l[T6ll34l l29l), motivated by many applications, for example in image 
processing [30|, computer vision 11421 . and pattern recognition (34). 

Given the low-intrinsic dimension of these models, it is natural to ask whether a small number of linear projec- 
tions ("measurements") of a data point, together with knowledge of the low-dimensional model, suffices to encode 
and reconstruct a data point. In the setting of sparsity, compressed sensing [22 40] not only says that, under suit- 
able assumptions l25l . this is indeed possible, but a convex optimization problem leads to the stable recovery of the 
original data point. In the setting where data lies on a low-dimensional manifold, the work of Wakin et al. J6] |46] on 
manifold-based signal recovery shows that low-dimensional (random) projections provide small distortion embeddings 
for manifolds, but leave open the question of reconstructing a data point. 

Standard compressed sensing [22 , 40 1 deals with the approximation of vectors, x e R D , which can be sparsely 
represented in terms of a given D xn dictionary matrix, O. Note that such O-sparse vectors can be compactly stored 
in a compressed form which is easy to transmit and store. Moreover, they can be recovered from their compressed 
representations when necessary. This compression/recovery problem has been well studied when x is available in 
its entirety before compression (see, e.g., Il35l ). However, in situations where x is costly to observe one may only 
have the ability to collect a very small set of measurements of x to begin with, thus making standard compression 
techniques inapplicable (see, e.g., |2| [T) and references therein). This is the compressed sensing regime, where loss- 
less compression must occur before one determines which vector components or transform coefficients are actually 
important. Hence, the goal of standard compressed sensing becomes to design anmxD measurement matrix M, with 
m as small as absolutely possible, subject to the constraint that a computationally efficient reconstruction algorithm, 
J\. : lR m — > 1R D , exists such that J?I (Mx) « x anytime x e R D is sufficiently compressible with respect to a given 
dictionary matrix <5. 

More precisely, given an integer d <SC n suppose that 
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where e e R" is in the row space of O e R Dx " and 



h = 



arg mm 

y€]R" with ||y||o<d 



be - 



The goal of a compressed sensing method is to approximate x as well as possible by approximating the at most d 
~* 

nonzero elements of fa e R". Furthermore, compressed sensing techniques aim to accomplish this task using as few 
linear measurements of x e R D , 



as absolutely possible. 

Let M e R mxD be the matrix whose / h -row is the measurement vector rhi e R D from Equation Q] above. A 
compressed sensing method consists of both a choice of M € R mxD , and a recovery algorithm, J?l : R m — » R D , such 
that 



in fixed £p,Sq norms, 1 < q < p < 2, for an absolute constant C M € R. Note that M £ R" forms a compressed 
representation of x e R D whenever m < D, which is then stably inverted by J\. Many recovery algorithms, 3K, 
have been developed for solving this problem when O is a square D X (n = D) orthonormal matrix, and MO has 
either restricted isometry Jl3] or incoherence ESES properties (e.g., see |[l2j [TO] [TT1 @4l [37l [38] [36] El ) . Perhaps the 
best such results are achieved by (m — 0(d log(D/d))) X D measurement matrices, M, whose entries are independent 
and identically distributed standard Gaussian random variables. These Gaussian matrices allow for near optimal 
compression (i.e., a near minimal size for m) while still allowing for the existence of recovery algorithms, &l, which 
achieve Equation [2] for an arbitrarily given square orthonormal matrix O. Furthermore, if M is Gaussian then O need 
not be known when the measurements, Mi, are computed: It suffices to know O only during reconstruction with 3K. 

One strand of work in compressed sensing has dealt with extending the results mentioned above concerning square 
orthonormal matrices to include settings where O is a more general (i.e., rectangular) D X n matrix. The first of 
these results extended compressed sensing to include D X n dictionaries, O, whose columns are all nearly pairwise 
orthogonal [41 1. This work shares all of the advantages of the aforementioned results concerning compressed sensing 
when O is square orthonormal matrix (e.g., nearly orthogonal O also do not need to be known until reconstruction 
via J{) when M is a random matrix exhibiting concentration of measure properties (e.g., if M is Gaussian as above). 
These results were later generalized further to allow recovery along the lines of Equation[2]when O has columns with 
less limited forms of coherence and redundancy lfj"4l (e.g., if O is a tight frame). 

In this paper we consider a geometric generalization of standard compressed sensing results for signals which are 
sparsely representable with respect to a square orthonormal matrix, O, by focussing instead on signals which are 
well represented by manifold models. More specifically, herein the Dxn dictionary matrix O utilized in standard 
compressed sensing models will be replaced by a piecewise linear approximation to a given submanifold of R D . To 
understand why this represents a generalization, note that the set of all vectors which are at least d-sparse with respect 
to an orthogonal matrix O defines a form of Grassmannian manifold consisting of O (p d ^ at most d-dimensional linear 
subspaces of R D . Hence, standard compressed sensing methods concerning square orthonormal matrices, 3>, can be 
viewed as dealing with a limited class of Grassmannian manifolds. In contrast, this paper allows for the approximation 
of signals which belong to much more general types of submanifolds of R D . 

The work herein utilizes ideas introduced by Baranuik and Wakin which demonstrate the existence of simple lin- 
ear operators capable of (nearly) isometrically embedding a given compact d-dimensional submanifold of R D into 
jj^O(d log D) w iti! OU t utilizing detailed knowledge regarding the submanifold's structure [6 |. In some sense, this work 
immediately yields measurement matrices, M e R mxD , for manifold-based compressed sensing. However, a complete 
compressed sensing strategy also requires an associated reconstruction algorithm, J[ : R"' — » R D , capable of accu- 
rately approximating points near the given manifold in a computationally efficient fashion. Algorithms of this kind 
were first considered by Wakin in [4-6 1 . Therein, Wakin showed that approximating a given point, x, near a compact 
d-dimensional submanifold of R D via an 0(d log D) linear measurements (i.e., see Equation[T) was possible with high 
probability if the measurements were randomly regenerated for each new x. Furthermore, he concluded that achieving 
strong reconstruction guarantees using one fixed set of linear measurements for all possible points, x, near a given 
compact submanifold of R D was difficult. However, it is important to mention that the results presented in [46 1 were 
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derived independently of any particular numerical reconstruction algorithm, J{. As a consequence, this line of work 
did not result an implementable recovery algorithm with accompanying approximation guarantees. 

In this paper we propose a computationally efficient reconstruction algorithm for manifold-based compressed sens- 
ing and prove accompanying approximation guarantees. In the process, we prove that a given point, x, near a compact 
d-dimensional submanifold of R D can be accurately approximated using 0(d log d) linear measurements with high 
probability when the measurements are randomly regenerated for each new x. This improves on previous results [46] 
by removing all dependence on the extrinsic dimension of the submanifold, D, from the number of linear measure- 
ments required for accurate approximation. Furthermore, we provide stability guarantees for the algorithm when one 
fixed set of 0(d log D) linear measurements are used for all possible points, x, near a given compact submanifold of 
R D . Finally, an empirical evaluation of our method indicates that it also works well in practice. 

Before moving on to discuss our methods and results in more detail we hasten to add that other techniques have 
also been proposed for manifold-based compressed sensing since the initial work of Baranuik and Wakin. Perhaps 
most notable among these are the statistical methods proposed by Chen et al. flTl . Chen et al. use training data from a 
compact d-dimensional submanifold of R D in order to estimate the manifold data's distibution via a Gaussian mixture 
model composed of Gaussians whose covariance matrices are all rank 0(d). They then use the probability density 
resulting from their low-rank Gaussian mixture model to approximate points on the manifold, x, with a maximum 
likelihood estimator when given only linear measurements, Mx e R m . In contrast, we utilize geometric and analytic 
techniques herein and make no attempt to estimate the statistical properties of any observed manifold data. 

1.1 . Methods and Results. As discussed above, the standard compressed sensing setup assumes that the signal to 
be approximated has a compressible representation with respect to an orthonormal basis (or frame lfl4l . or incoherent 
dictionary [41]). Although this is certainly a useful setting, there are many applications where signals might be better 
approximated via more geometrical considerations. For example, consider the setting where the class of potential 
input signals varies continuously as a function of a small number of parameters (e.g., see B7] [46]). In this case it 
makes more sense to consider the approximate reconstruction of signals, x e R D , which are close to a given compact 
d-dimensional submanifold, At, of R D . The optimal approximation for x e R D is then defined to be 



In effect, x opt is the best approximation to x on At. Our objective is to approximate x opt e At C R D given only 
a small number of linear measurements, Mx e R m , where M is an m X D measurement matrix as above. Hence, 
in this paper we seek to design a measurement matrix M € R mxD with m as small as absolutely possible, together 
with a computationally efficient reconstruction algorithm J{ : R m — > R D , so that (Mx) » x whenever x e R D is 
sufficiently close to a given compact d-dimensional submanifold of R D , AlQ 

Note that a manfold, Al, is now taking the place of the dictionary matrix, O e R Dx ", in the standard compressed 
sensing setup discussed above. Of course, it is unreasonable to expect that we can always have an exact representation 
of the signal manifold at our disposal. Instead, we assume that we have a set of locally linear approximations to 
the given manifold which capture the local geometric structure of the manifold's tangent spaces. In fact, such piece- 
wise linear manifold representations are exactly the type of approximations produced by existing manifold learning 
algorithms like LTSA |49| and Geometric Multi-Resolution Analysis |3|. Thus, we assume that the signal manifold, 
At, is approximated by such a method at some point. However, as in standard compressed sensing methods, the 
manifold-based compressed sensing strategies developed below do not require that these piecewise linear manifold 
representations are known when the compressed measurements, Mx e R", are collected. Approximation of the signal 
manifold can be put off until later when signal reconstruction takes place (i.e., one does not need a piecewise linear 
manifold approximation until J?I (Mx) is actually computed). 

Although the manifold-based compressed sensing methods developed herein will work with any locally linear ap- 
proximation to the given signal manifold, At, we will focus on multiscale piecewise linear manifold approximations 
to Al in particular. As opposed to fixed-scale locally linear approximations, multiscale representatons better approx- 
imate non-smooth manifolds, and manifolds contaminated with noise lfT31 [3l. For example, multiscale locally linear 
approximation is particularly beneficial for signal processing tasks involving image manifolds, which tend to be non- 
differentiable in many realistic settings [47 ] . Hence, we formulate our compressed sensing methods below with respect 



Put another way, we require that (Mx) » x p t which implies that (Mx) » x whenever x » x pt. 
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to general multiscale piecewise linear manifold approximations of the type produced by Geometric Multi-Resolution 
Analysis (GMRA) 0. 

As mentioned above, the manifold embeddings of Baranuik and Wakin (§J [46] can be considered as manifold- 
based compressed sensing matrices, for which however no associated recovery algorithms were explicitly defined. 
Indeed, the measurement matrices, M € R mxD , used in the manifold-based compressed sensing methods developed 
below are modifications of their embedding matrices. However, unlike the embedding matrices considered in [6|, the 
measurement matrices considered herein (nearly) isometrically embed both the underlying signal manifold, M, and 
the multiscale piecewise linear approximation to M into R m in a way which preserves the fidelity of the embedded 
multiscale locally linear approximation to the embedded image of M. Accomplishing this requires us to reengineer 
the arguments from [6| using Johnson-Lindenstrauss embedding [32 1 techniques similar to those utilized in [5 1. The 
resulting measurement matrices, M, ultimately justify this complication by allowing us to develop reconstruction 
algorithms which work exclusively with locally linear approximations to M while still preserving approximation 
accuracy with respect to the true manifold, M. 

The reconstruction algorithm, J{ : R m —> 1R D , proposed below consists of two well-studied computational sub- 
routines: a method for solving approximate nearest neighbor problems (e.g., ||3TI [8] H)) in a space of dimension 
comparable to the intrinsic dimension of the data, and a method for solving an overdetermined least squares problem 
(e.g., via the singular value decomposition of the associated matrix). The algorithm works by first using the com- 
pressed measurements, Mi, of x to locate the best local linear approximation to M at x. This is accomplished by 
running a nearest neighbor algorithm on a set of "center points" from near the manifold, each of which represents a 
particular linear approximation to M in a neighborhood of the center point. Because M has low intrinsic dimension, 
and the center points are arranged in a multiscale hierarchy as per [3 |, this search can be carried out relatively quickly. 
To finish, the algorithm then approximates x opt , the best approximation to x on M, by solving an overdetermined least 
squares problem using the linear approximation to the manifold located in the first step. 

In this paper we prove two compressed sensing results for the proposed reconstruction algorithm, each of which 
utilizes randomly generated measurement matrices, M £ R mxD , satisfying a different set of properties. Roughly 
speaking, the first result indicates that m = 0(d log(rf / 6)) linear measurements of a given x e R D suffice to create 
a compact representation, Mx e R m , from which the reconstruction algorithm, M, discussed above will recover an 
approximation to x opt € M satisfying 

||x-^I(Mf)|| < c||x-x opt || + £>. 

Here C € R + represents a fixed universal constant, 5 e R + can be freely chosen, and M is the given d-dimensional 
submanifold of R D . This result provides what is commonly referred to as a nonuniform recovery result, by which we 
mean that the upper bound on ||x - J?I(Mx)|| holds with high probability for each x e R D over the choice of random 
measurement matrix. 

The second theorem proven below provides a type of uniform recovery result which holds with high probability for 
all vectors, x e R D , of a particular class. Simply put, it asserts the existence of a D-dimensional tube around the given 
manifold, T D Ai, within which accurate approximation will always take place with high probability over the choice 
of random measurement matrix M € R mxD . More specifically, the second theorem says that m = 0(dlog(D/<5)) 
linear measurements of any x e T C R D suffice to create a compact representation, Mx e R m , from which the 
reconstruction algorithm discussed above, will recover an approximation to x opt € M satisfying 

C i 

||x- J?I(Mx)|| < c||x - x opt || 2 + — [|x- f opt || + 6. 

yd 

Here, as above, C € R + represents a fixed universal constant and 5 e R + can be freely chosen. 

The reminder of this paper is organized as follows: In the next section we begin by fixing terminology and reviewing 
relevant definitions and theorems. Having established the necessary notation, we then give precise statements of the 
two main results proven in this paper in Section |Z21 Finally, in Section [2731 we conclude Section [2] with a discussion 
of the different types of measurement matrices, M e R mxD , associated with each of our two main results. In Section[3] 
the recovery algorithm, 3K, is presented and analyzed. In particular, the approximation error of for a given x, 
||x - J?I(Mx)||, is bounded for each of the two possible types of measurement matrices, M, considered herein. The 
runtime complexity of J[ is also determined. Next, in Section [4] the number of rows, m, required for each type of 
measurement matrix defined in Section 1231 is upper bounded. This formally establishes the amount of compression 
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possible in our manifold-based compressed sensing schemes. To finish, the compressed sensing methods developed 
herein are evaluated empirically in Section [5] 



2. Notation and Setup 

Given n e N we will define [n] to be the set {0,1,2,. . . ,n\ C 7L. All norms, || • ||, will refer to the standard 
Euclidean norm unless otherwise stated. We will denote an open ball of radius 5 e R + centered at y e R D by (y). 
Our real valued m X D measurement matrix will always be denoted by M. Furthermore, M will always be linear 
Johnson-Lindenstrauss embedding [32] [27] [21] [33] of a finite set Scl D into R"'. 

Definition 1. Let e e (0, 1/2), and S C R D be finite. An mxD matrix M is a linear Johnson-Lindenstrauss embedding 
ofS into R m if 

(l-e)||M-i^| 2 < \\Mu-Mv\\ 2 < (l+e)||«-iJ|| 2 
for all u,v e S. In this case we will say that M embeds S into R m with e-distortion. 

The following theorem is proven by showing that anmxD matrix with randomized entries will satisfy Definition[T] 
for a given set S C R D with high probability whenever m is sufficiently large (e.g., see lETl ). 

Theorem 1. (See [32] [21].) Lete e (0, 1/2), and S C R D be finite. Letm = 0(e~ 2 log |S|) be a natural number. Then, 
there exists anmx D linear Johnson-Lindenstrauss embedding ofS into R"' with e-distortion. 

For the remainder of this paper Ai will denote a compact d-dimensional submanifold of R D with d-dimensional 
volume V. We will characterize results concerning any such manifold M via its reach [[26) , denoted reach (Ai), which 
is defined as follows: Let 



D(M) = {xe R D | 3 a unique y e M with ||x - y|| = d(x,M)} 



and 



tube, (M) = [xe R D | d (x, M) < r j , 
where d(x, M) is the standard Hausdorff distance. We then define 
(3) reach (M) = supjr > | tube,- (M) c D (At)}. 

Intuitively reach (Ai) is the radius of the largest possible non-self-intersecting tube around At. For example, if M 
is a d-sphere of radius r, then reach (Ai) = r. The reach of a manifold is particularly useful because it allows the 
development of concise bounds for many manifold properties of interest (e.g., curvature, self-avoidance, packing 
numbers, etcetera). See [26. 39][6][T8] for more details. 

Given a compact set S C R D we define a b-cover ofS to be any finite set S C R D with the following property: 

VfevS, 3y e S such that x eS 6 (y). 

We will refer to a 6-cover of S, S, as minimal if |S| < |S| for all other 5-covers of S, S. Hereafter, Q (S) will denote a 
minimal 6-cover of a given compact set S in R D . The following lemma, easily proven using results from [39] , bounds 
|Cg for any compact d-dimensional Riemannian manifold, Ai, in terms of 6 and reach (Ai). 

Lemma 1. (See [39 \.) Let Ai C R D be a compact d-dimensional Riemannian manifold with d-dimensional volume 
V, and suppose that 5 £ R + is less than reach (Ai). Then, any minimal b-cover of Ai, Ca (Ai), will have 



Q(M) 



2$ b d 



In order to help us develop a practical recovery algorithm we will assume we have a multiscale piecewise linear 
approximation of Ai of the type yielded by GMRA [3]. Let / e N and Kq,K\, ,..,KjeN, For each j e [/] we assume 
that we have a set of affine projectors, 



P; = jPyjt : R D -> R D \ke [Kj]}, 



which approximate Ai at scale j. More precisely, these affine projectors will collectively satisfy the three following 
properties: 
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(1) Affine Projections: Every has both an associated vector, Cyt e R D , and an associated orthogonal d X D 
matrix, Oy / c , so that 

P //Jt (x) = QPjfiiJk (x- Cj,k) + c jrk . 

(2) Dyadic Structure: There exist two universal constants, C\ e R + and C 2 e (0, 1], so that the following 
conditions are satisfied: 

(a) K ; <K /+ iforall; e[/ -l]. 

(b) llCffo ~~ > Ci • 2"-' for all j 6 [/] and fci,^ e [Ky] with fci + /c 2 . In other words, the Cy^-vectors at 
each scale j e [/] are well separated from one another. 

(c) For each j e [/] - {0} there is exactly one well defined parent function, py : [Kj] — > [Ky-i], with the 
property that 

\\cj J{ - Cy-wJI < C 2 min ||c;, c - Cj-w\\ ■ 

Together these / parent functions collectively define a tree structure on the ^-vectors. In particular, 
each cjyfc with A: e [i<Co] is a root node while each with k 6 [i<Cj] is a leaf. 

(3) Multiscale Approximation: When M is sufficiently smooth the affine projectors at each scale j e [/], 
{Py,)t I k £ [Kj]}, approximate M pointwise with error O (2~ 2 -'). 

(a) There exists a constant jo e [J — 1] so that 6 tube Cl . 2 -y-2 (Al) for all j e [}] - [;'o] and k e [JCy]. Note 
that jg is a function of the constant C\ from Property |2bl We will generally assume that a jo e [/ - 1] 
satisfying this condition exists when C\ is chosen to be as large as possible above. 

(b) Foreach/e [/]andxe R D letft:y(x) e [Kj] be such that Cj k ^ is one of the nearest neighbors of x in the 

set {c ; v; c j / = j, ke That is, for each j e [J], let 

kj (x) = argmin||x - Cy^||. 

MKj) 

Then, for each x e M there exists a constant C e R + such that 

-V m (x)\\<C-2- 2 i 

for all e [/]. In addition, affine projectors associated with Cyj-vectors that are nearly as close to any 
x e M as Cjk (x) can a ^ so accurately represent £ Hence, for each x e M their exists a constant C 6 R + 
such that 

||x-P^(x)|| < C-2~) 
for all j e [}] and fc' e [Kj] satisfying 



x- c j/kt \\ < 16 ■ max| x - c jJ( ^ , d ■ 2 ' x | 



Note that the affine projectors approximate M more accurately as the scale j e [/] increases. The finest scale 

resolution is obtained when j = J. See O for details. 
The remainder of this paper is devoted to analyzing the number of measurements required in order to approximately 
reconstruct an arbitrary point x e R D which is nearly on a compact d-dimensional submanifold M C R D . In order 
to yield substantive progress we must first assume some knowledge of M (i.e., our manifold-based signal dictionary). 
Thus, we will assume below that we have a set of affine projectors, {lP;,)c | j G []],k e [Kj]}, for M. as discussed above, 
and will primarily focus our analysis on bounding the number of measurements, m, sufficient to accurately compute 
^jk {x) (*) f° r any given input vector x e R D and scale j e [/]. 

2.1. The Goal: Approximating Manifold Data via Compressive Measurements. Let P = jp ; | j e [/]J be a mul- 
tiscale piecewise linear approximation to M as discussed above. Given such a P we can accurately approximate any 
xe Ale R D (e.g., see Property l3bV However, herein we are primarily interested in approximating arbitrary vectors, 
x e R D - M, as well as they can be approximated by a nearest neighbor on the manifold, x opt € M. As we shall see, 
P can be utilized for this task. The following lemma demonstrates that Py^ (f) (*) approximates any vector x e R D 
nearly as well as x op t € M does. 
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Lemma 2. Let M C R D be a compact d- dimensional Riemannian submanifold q/R D , and x £ R D . Furthermore, let 
TPj = I k £ [Xy]J be a scale j £ []] GWRA approximation to At Then, 

||x-P^(x)|| < 17||x-x opt || +0(T>) 

for all k' £ [Kj] satisfying 



\\x-c hk ,\\ < 



X - C: 



, Ci ■ 2~t 



')■ 



Proof: Let 5 = max 



x - c 



let ft:' 6 [Ky] be such that ||x - c^' 
essentially finished since 



, Ci ■ 2 ' x >, where Ci e R is defined as in Property |2bl above. Furthermore, 



< 86. To begin, suppose that ||x - Cjjc>\\ < 17 ||x - 



j|x-P^(x)|| = IZ-O^Oy^Jff-C^)! < ||f-Cy^|| < 17\\x- 



^ opt 1 1 



In this case we are 



Thus, we will hereafter assume that ||x - Cjp || > 17 ||x - x opt || without loss of generality. 
Repeatedly applying the triangle inequality we see that ||x - P^' (*)|| is bounded above by 



X X pt ~t~ 



*opt - P,,)c' (^opt) + I ~Pj,k' (*opt) - P;,i' (X) 

The third term in the sum immediately above can be bounded by 

P;>'(^opt) -P^'(x) = 0^,0^, (x-X opt )| < ||x-X opt 



To bound the second term we note that ||x - c,/ c /|| > 17 ||x - x opt || implies that 
Therefore, 



1 \\ 



^Opt 1 1 



'opt 1 1 



max 



< 9 x - x, 



opt | j 



Property |3b]now guarantees that 



The result follows. □ 



, Ci • I''- 1 ) < 16 ■ max 



*opt C j,kj(t opt ) 

, Ci ■ 2^'- 1 } 



> 9 x- 



l opt| 



A;(*o P t) 

*opt - Pp:' (^opt)|| ^ C • 2~L Hence, we now have 
||*- P/,(c' (x)\\ < 2 \\x - X op t|| + C • 2~K 



C ° P t C j,k^ opt ) 



d ■ 2~i- 



/8. 



In this paper we are primarily concerned with achieving approximation results akin to Lemma [2] utilizing com- 
pressive measurements. This will allow us to extend the successful sparse approximation techniques and results of 
compressive sensing to the recovery of signals which belong to low dimensional submanifolds of R D . In order to 
accomplish this goal we must first propose and then subsequently analyze both a measurement operator and an asso- 
ciated recovery algorithm. Furthermore, in order for it to b e of practical value, we must demonstrate that the proposed 
recovery algorithm is computationally efficient, easy to implement, and provably accurate. We begin this process 
by considering our measurement matrices in Section 12.31 We then develop a practical reconstruction algorithm in 
Section[3] Before we begin, however, we will first state the main results proven herein. 

2.2. Main Results. In the statements of the two propositions below, C £ R + is an absolute universal constant which 
is independent of x, At At's GMRA approximation, etcetera. Note that the upper bounds provided for this constant 
in Section[3]are almost surely quite loose. We state our first result. 

Proposition 1. Fix precision parameter 6 £ R + and let x £ R D . In addition, let Pj, ] = O (log [1 / (5 reach (AI))]), be 
a GMRA approximation to a given compact d-dimensional Riemannian manifold, M C R D , with volume V. Finally, 
let 

d 



m = 0\d log 



5 reach ( M) 

i 



+ logV 



be a natural number, and define J?I : R m — > R D to be Algorithm\J}from Section\3\below. Then, there exists an m X D 
matrix, M, such that 

||x - J?T(Mx)|| < C • ||x - x opt || + b 
with arbitrarily high probability. Furthermore, J{ (Mx) can be evaluated in -time. 
Proof: The result follows from Theorem|3] the first part of Theorem Q and the discussion in Section lXTl □ 

Proposition Q] provides a nonuniform recovery guarantee for each given x e R D . If desired, bounds could be al- 
tered to depend on the desired probability of success, p e (0, 1), by including an additional multiplicative factor of 
0(log(l/l - p)) in both the runtime of the algorithm and the upper bound for m. The measurement matrices, M, 
referred to by the proposition can be any standard Johnson-Lindenstrauss embedding matrix (e.g., a Gaussian random 
matrix, a random orthogonal projection, etc.). Hence, they are well understood. The worst case theoretical runtime 
complexity of the recovery algorithm is polynomial in m. We refer the reader to Section |5]for an empirical evaluation 
of the recovery algorithm's computational efficiently in practice. Finally, we note that the number of required mea- 
surements, m, is entirely independent of the extrinsic dimension, D. Next, we state a uniform approximation guarantee 
for Algorithm Q] 

Proposition 2. Fix precision parameter 5 e R + . In addition, let P/, / = O (log [1/(6 reach (At))]), be a GMRA 
approximation to a given compact d-dimensional Riemannian manifold, M C R D , with volume V. Finally, let 

OT ^ ( dl0g ( 5reac P h ( M) ) +1 °g y 

be a natural number, and define J?I : MI" — > 1R D to be Algorithm\J]from Section\3\below. Then, there exists an m X D 
matrix, M, such that 

C_ 

for all x £ R D with 



■Jl(Mx)\\ < C\\x-x opt \\ 2 + — \\x-x opt \\ l +5 



2||x-x opt || 2 + —jz \\x - XoptHj < max|||x-c^^||, 5 

Furthermore, J{ (Mx) can be evaluated in worst case [7.°^ log V + O (md 2 + dDyj-time. 

Proof: The result follows from Theorem^] the second part of Theorem|2j and the discussion in Section lTTI □ 

Proposition|2]is best interpreted as a general stability result. It guarantees that Algorithm[T]will uniformly approx- 
imate all points which are sufficiently close to the manifold M (i.e., the points need not be exactly on AT). Thus, 
Algorithm [T] has some limited robustness to arbitrary additive input noise. The examples in the experimental section 
suggest that the constants involved are very mild. 

2.3. The Measurement Matrix. In the process of developing an algorithm to approximate ~P^ k ^ (x), and subse- 
quently demonstrating its accuracy, we will require some knowledge regarding our m X D measurement matrix M. 
We shall consider two sets of assumptions regarding M's interaction with both the manifold M and our given set of 
affine projectors for M at each scale e [/]. Each set of assumptions will ultimately result in both different approxi- 
mation guarantees for our reconstruction algorithm, and different measurements bounds (i.e., sufficient upper bounds 
on m) for M. We will postpone discussion of how to create M and how to bound the number of rows it must have 
in order to satisfy each set of assumptions below until Section|U In Section[3]below we will begin by presenting our 
reconstruction algorithm together with approximation error bounds under each set of assumptions regarding M. 

Let x e R D and P = |p ; | ;' e [/]J be a fixed set of affine projectors for M for each scale j e [/]. Fix e e (0, jj. In 
Sections [3]and|4]we will assume that our mxD measurement matrix M satisfies each of these sets of assumptions in 
turn. 

(1) Assumption Set 1: Required for Nonuniform Recovery of a Given x e R D (see Proposition []} 

(a) Let Si C R D be 

Si = {Qjfljjc (x - c>) | j e [J], k e [Kj]} J jf - c> | ; e [/], k e [Kj]} [j jo 



We will assume that 
(1 



■ e ) \\y 

for all y,z 6 Si. 
(b) Furthermore, we will assume that 



zf < 



|My-Mz|| 2 < (l+e)||y-z||' 



(1-e) 



< (l + e) 



for all / e [/], fc e and y e R E 



(2) Assumption Set 2: Required for General Stability (see Proposition 

(a) Let S 2 =MU {c},* | ; e [/], k e [Kj]) c R D . We will assume that 

(1 -e)||y-z|| 2 < ||My-Mz|| 2 < (1 + e) ||y - z\f 

for all y,z e S2. 

(b) Furthermore, we will assume that ||My|| is bounded above by Em (y) for all y e R D , where Em : R D 
R + is a continuous function with Em (o) = 0. Em is discussed in detail in Section l4~2l 

(c) As before, we will assume that 



for all / 6 [/], fc £ [Ky], and y e R D . 
(d) Finally, we will also assume that 



(1 



e)y-Ti, k ($)\\-2-I < \\My-MP, k (y)\\ < (1 + e) \\y - P i/)fc (y)|| + 2~ ] 
for all / 6 [/], fc £ [JCy], and y 6 At. 

Note that the critical difference between the two sets of assumptions above concerns the treatment of x e R D and 
x op t e M C R D . If possible we would like to obtain measurement bounds which are independent of the ambient 
dimension, D. Since an arbitrary vector x may contain a substantial portion of its energy in the subspace orthogonal 
the tangent space to M at x opt , results which are entirely independent of D generally appear to be unattainable unless 
our measurement matrix happens to successfully preserve information in the direction of x - x pt- We assume that M 
preserves lengths of vectors in the general direction of x — x opt as part of our first set of assumptions. In the second 
set of assumptions we do not. It is primarily this difference which leads to different measurement bounds and error 
guarantees in each case. 

3. The Reconstruction Algorithm 

We will ultimately upper bound the number of measurements required in order to approximate a given x e R D 
which is close to M C R D via the simple reconstruction technique presented in this section. In doing so we will 
require that the reconstruction algorithm approximates x nearly as well as the vector on M closest to x, 

x opt = argmin||x - y||, 

approximates x. Our first order of business, therefore, will be to derive explicit error guarantees for the reconstruction 
technique considered herein which demonstrate that it is indeed "near-optimal" in the sense discussed in Section Q] 
above. Let J{ (Mx) e R D denote the output of our reconstruction procedure for a given input x e R D . We wish to 
bound the approximation error 

\\x-Jl(Mx)\\ 

in terms of the optimal approximation error, \\x - x opt \\, and an additive error term of size O (2~^ whenever possible. 
Before this task can be accomplished, however, we must first describe the recovery algorithm we will use to calculate 

Our reconstruction procedure uses compressive measurements of x in order to approximate Pyjt (f) 0*0 m two steps 
(see Algorithm Q] above). First, the compressive measurements of x are used to determine a "center" vector, cw, 
which is nearly as close to x as its nearest neighboring center, Cj k .(jf), is- This step is guaranteed to work well as 
long as our measurement matrix, M, preserves appropriate distances between x and all the center vectors at scale j. 
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Algorithm 1 Approximate P^.m (*) 



i: Input: Measurements Mx e R m , Scale e [/], Approximation P ; = {p^ | k e [JC ; ]j to manifold M c R D 
2: Output: J?T(Mx), an approximation to P^/j) (x) ~ x 



3: it' 



4: M 



argmiiw.] ||Mf-Mc}, fc || 



6: Output ^ (Mf) 



MO T , , t? - Mx + Mc ; p 



Next, an accurate projection of x — Cw onto the d-dimensional subspace associated with c\y is found by solving an 
overdetermined least squares problem. This step will also work well as long as our measurement matrix M is well 
conditioned on all of the d-dimensional subspaces associated with the scale j center vectors. As we demonstrate below, 
the two sets of assumptions for M in Section l23l are sufficient to guarantee that both steps work well. 

The following lemma guarantees that the center found in line 3 of Algorithm Q] is nearly as close to x as x*s true 
nearest center is. 

Lemma 3. Fix e € ^0, jj. Let M C 1R D be a compact d- dimensional Riemannian submanifold o/R D , and x £ R D . 

Furthermore, let Py = {Py,jt I & e I-^-;']} ^ e a sca ^ e j e [/] GWRA approximation to At Then, if our mxD measurement 
matrix M satisfies Assumption Set 1 in Section \2~3\ above, line 3 of Algorithm\l\will select ak' € [Kj] which has 



\\ x - c hk'\\ ^ 



l + e 
l-e 



\\X - C; 



If our m X D measurement matrix M satisfies Assumption Set 2 in Section \2~3\ above, then line 3 of Algorithm\l\will 
select ak' G [Kj] which has 



(4) 



l+e 
l-e 



x - c 



+ 1 + 



l+e 
l-e 



opt | 



Proof: Using the first set of assumptions for M together with the definition of k' e [Kj] from Algorithm Q] we can see 
that 



I* ~* 1 

j | % — c y,fc' j 



l-e 



|M£-McyJ 



l-e 



Mx - Mc' 



l+e 
l-e 



x - c 



;,Jc,(f) 



We now turn our attention to the case where M satisfies the second set of assumptions. We have that 



||*-c^|| < ||x-x op t|| + ||x opt -c ; ^|| < ||x-x opt || + J -^— e ■ ||Mx pt-Mcy^| 

+ (|| M ( f " ^opOll + || Mf - Mc %dl) 



< X - X, 



< x - X, 



o P t|| + -w ' || V ~~ -^opvii 
Focusing on the first and third terms in the line immediately above, we note that 



/l+e 




/l-e 


x °pt c ;,ic,(#) 



l+e 
l-e 



X °P' c ;,*y(*) 



l+e 
l-e 



x - c 



l+e 
l-e 



x - x. 



opt | 



The result follows. □ 



Next, we prove a lemma which guarantees the accuracy of the solution of the overdetermined least squares problem 
produced by line 4 of Algorithm Q] 
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Lemma 4. Let M C R D be a compact d- dimensional Riemannian submanifold q/R D , and x e R D . Furthermore, let 
TP] = ^Pj,k I k £ V^j\\ be a scale j G [/] GWRA approximation to M, and k' £ [Kj] be the value computed by line 3 of 
Algorithm\l] Then, if our m X D measurement matrix M satisfies either set of assumptions in Section \2~3\ above, line 
5 of Algorithm\l\will produce an J{ (Mx) e 1R D which has 

2 ii ii 
||p^(f)-J?I(M^|| < — -p[f-P^(f)]||. 

Proof: Let u ' e R rf be as defined in line 4 of AlgorithmQ] Given either set of assumptions for M we will have 

1*5**' - ®J**i* ( f - c >)|| * (|| M ^" ' " M ( f " c >)[| + H z " (|> >' tI, v.-| (* ~ c >)||)' 

where I is the D X D identity matrix. By the definition of it ' in Algorithm[T]we can now see that 



The stated result follows. □ 



1-e 



M[/-^3.^](x-c>)|. 



Finally, we demonstrate the accuracy of the output of AlgorithmQ] as an approximation to x. 
Theorem 2. Fix e e (0, |). Let M C R D fee a compact d-dimensional Riemannian submanifold o/R D , and x G R D . 

Furthermore, let Py = {Py,t: I ^ £ L^-;']} ^ e a sca ^ e j e [/] GWRA approximation to At TTien, //oMr mxD measurement 
matrix M satisfies Assumption Set 1 in Section |Z31 above, Algorithm\J}will output a point, jTI(Mx) G Pt D , which 
satisfies 

\\x-M(Mx)\\ < 100.3 ||x - x opt || +o(2.-i). 

Now suppose that our m X D measurement matrix M satisfies Assumption Set 2 in Section \2~3\ above, and that P; 
i's a scale j GWRA approximation to Mfor some j > jo ( revisit Properties\3a\and\3b\in Section]2\for the definitions of 
the constants jo, C\, and C). Furthermore, suppose that x G R D — M has 

(5) 2 • E M (x - f op t) < (8 VT^e" - Vl+e) ||x - c^ (f) | - ( Vl - e + Vl+e) ||x - x opt || . 
Then, Algorithm\T\will output a point, (Mx) £ R D , which satisfies 

(6) ||x-J?I(Mx)|| < 220||f-x op t|| +4-E M (x-f op t) + o(2^). 



Proof: To begin we note that 

(7) ||x- M(Mx)\\ < ||x-P y ^(f)|| + ||p^(f)-^(Mx)|| 

where k' e [Kj] is defined as in line 3 of Algorithm Q] The first set of assumptions for M together with Lemmas [2] 
and [3] tells us that 

||x-P^(x)|| < 17||x-x opt || + °( 2 ~') 
since e € (0, jj. Furthermore, the first set of assumptions for M together with Lemma|4]indicates that 

\\P ht (x)-W(Mx)\\ < -l_.|M[f-P i/ifc ,(f)]| < 2^j^-\\£-P jik ,(x)\\. 

Hence, we obtain the stated bound in the first case. 

Now assume that M satisfies Assumption Set 2 in Section l2~3l We will begin by bounding the P;,jf (x) — 3\ (Mx) 
term in Equation|7] Applying Lemma|4]and then utilizing our second set of assumptions regarding M we can see that 

2 1 1 2 

||p //t (x)-yi(Mx)|| < ■ m[x-P^(x)]| < Y^(|| Mx - M ^opt|| + ||Mx pt-MP^(x)||) 



1-e 



(8) 



a 1+£ \\-> -> 

< 2 ■ ■ x - x, 

1-e 11 



Mx - Mx opt [| + ||Mx opt - MP jrk , (x opt )| + J MP j/t (x opt ) - MP j/k , (x)|| j 
+ (ll Mx ~ Mf o P t|| + |Mx op t - MPjj? (^opt)||) ■ 



In order to bound the last term in Equation[8]above, we note that ||x-Cy/ c /|| < 8 



whenever 



whenever Em (x — x pt) 
x-Cj rk ,\\ > V7\\x- 

' -*opt|| 



satisfies Equation^ Therefore, we will have ||x pt - Cjj? || < 16 x opt _ c j,kj(g op ,) 
by an argument identical to that presented in the second paragraph of the proof of Lemma[2] Hence, Property I3b1in Sec- 
tion|2]guarantees that x opt - Py,^ (x pt)| ^ C ■ 2~i whenever ||x - Cjj c >\\ > 17 ||x - x pt||- Item (d) of Assumption Set 

2 in Section l2~3l now guarantees that |Mx opt - MPy ^ (*opt)| will also be o(l~^ whenever ||x - Cj,fc\\ > 17 ||x - x opt ||. 
To finish, suppose that ||x - Cj /k '\\ < 17 ||x - x op t||. Continuing to bound the last term of Equation [8] in this case we 



obtain 



1 -e 



M£ op , - MPjj„ (x opt )| < (|| Mf o P t - Mc jrk ,\\ + M& k ,®j, k , (f opt - c>)||) 



< 2- 



VT 



+ e 



1 + e 



1-e 



| Xopt ~~ || ^ ' 'J ~ 1 1 Xopt — 



< 36 ■ ^ _ fc (l + Vl +e) ||x - . 
Combining this bound with the previous paragraph concludes the proof. □ 



-opt 1 1 



Theorem[2]demonstrates that Algorithm Q~]can stably approximate vectors x e R D - M as long as the measurement 
matrix, M, satisfies one of the two sets of assumptions detailed in Section l2~3l However, the strength of the approxima- 
tion guarantee depends on which set of assumptions M satisfies. When M possess the attributes listed in Assumption 
Set 1 (most notably, attribute (a)) the vector returned by Algorithm[T]will always provide an approximation to x whose 
error is a within a constant multiple of the optimal approximation error. When M satisfies Assumption Set 2, on the 
other hand, Algorithm[T]is only guaranteed to provide near optimal approximations for vectors, x, which are relatively 
close to the manifold M. 

3.1. Practical Implementation of Algorithm [TJ In line 3 of Algorithm Q] we want to locate the nearest neighbor 
of Mi e R m from the set {Mc ; ;jc | k e [X ; ]J C MP 1 . This can be accomplished naively in 0(mK ; )-time. However, 
Kj is potentially large in the worst case (see Lemma |6]below). Therefore, it is important to note that the runtime's 
dependence on Kj can be greatly reduced in practice with the aid of standard space partitioning techniques (e.g., by 
building a k-d tree to solve the nearest neighbor problem). Alternatively, other fast nearest neighbor methods could 
also be utilized (e.g., see ll3Tl l8ll4l and the references therein). Due to the dyadic structure of our Cy ^-vectors, the worst 

case theoretical runtime complexity of line 3 can be improved slightly to -time by using cover trees [8 |Jj 

Alternatively, if it suffices to find a (1 + 6)-nearest neighbor of Mi with high probability, we can utilize even faster 
algorithms which run in m°^-time (see Proposition 3 in J31 1 together with the bound for m in Theorem[3]below). 

Line 4 of Algorithm[T]requires the solution of an overdetermined least squares problem. This can be accomplished 
in 0(md 2 )-time via the singular value decomposition of MOT^,. Furthermore, the solution can be computed accurately 

since both sets of assumptions in Section 12.31 guarantee that MOT fc , is well conditioned. Finally, explicitly forming 

(Mx) in line 5 of Algorithm[T]can be accomplished in 0(Dd)-time. The total runtime of Algorithm[T]will therefore 
be O (d(md + D) + Tnn), where Tjvn bounds the runtime of the nearest neighbor algorithm used in line 3. 

4. Upper Bounds on the Number of Required Measurements 

In this section we will bound the number of rows, m, needed in order for our m X N measurement matrix, M, 
to satisfy each set of assumptions discussed in Section 12.31 In order to do so, it will suffice to let M be a linear 
Johnson-Lindenstrauss embedding of a well chosen set of points in R D into Pt m . Of course, this set of points will 
vary depending on which set of assumptions from Section [2731 we want M to satisfy. Below we consider each set of 
assumptions separately. However, we will first establish two lemmas which will be useful in both cases. 

Lemma 5. Let e e ^0, jj. Furthermore, let j e [/] and k € [Kj] denote an affine projector Pyjt (see Property\l\in 

Section^. Then, there exists a finite set of vectors, Qj tk C Xjjc = joJj.Oy^i/ | y € R D | with \Qj,k\ ^ (12/e) d + 1, such 



'Here V is the volume of the d-dimensional manifold M C B. . 
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that 

{\-e)\^ hk y\ < Im^o^I < {l + e )\^ jrk g 
for all y e R D whenever M embeds Qjj c into R m with e/2-distortion. 

Proof: We let Q' k be a minimal e/4-cover of the d-dimensional unit ball in Xy^ centered at € Xy^. Now set 

Qj,k = Q'j k U {o}- Th e stated upper bound of \Qjjc\ follows from existing covering results (see for references). 
Furthermore, if M embeds Qn into JR!" with e/2-distortion it is easy to see that 

(l-e/2)IMI < \\Mq\\ < (l + e/2)|| 9 || 
for all q e Q ; £. The remainder of the proof now directly parallels the proof of Lemma 5.1 in 131. □ 



Lemma 6. Fix } e N and let Py, j e [/], be a GMRA approximation to a given compact d-dimensional Riemannian 

manifold, Ai C R D , with d-dimensional volume V '. Furthermore, suppose that f e [J]-|max jjo/ log2 ( re ach(A-i) ) ~ ^jl 
where jo and C\ are defined as in Property \3a\ of Section [2] Then, the number of affine projectors at scale f, Ky, is 

bounded above by 2d( j^ 15> • V ■ + lj 2 . 

Proof: We know that S Cr2 -i'-2 (cj',k) H M is nonempty for all k £ [Kj>] since f > ]q. Now consider a minimal 
Cj ■ 2~' ~ 2 -cover of Ai, C Cl . 2 -y-2 (M). It is not difficult to see that every Cj* ^ will be contained in ^Ci-2 - J /-1 (.V) some 
y e C Ci . 2 ,'-2 (AI). Furthermore, there can be no y e C c 2 -y-2 (At) such that two distinct are contained in the 
same ball, S Ci . 2 -/-i (y), by Property I2b1 in Section|2] Hence, Kf < |C Ci . 2 -/'-2 (Ai) |. Applying Lemma[T] concludes the 
proof. □ 

We are now prepared to upper bound the number of rows required by our mxN measurement matrix, M, in order 
to satisfy each set of assumptions listed in Section l231 

4. 1 . Bounding the Number of Rows Required to Satisfy Assumption Set 1. 

Theorem 3. Fix e e (o, jj, x e R D , and } E N sufficiently large. Furthermore, let Py, j £ [}], be a GMRA 
approximation to a given compact d-dimensional Riemannian manifold, Ai C R D , with volume V. Then, there exists 
anmxD matrix, M, which satisfies Assumption Set 1 in Section |Z?1 with m = O (de~ 2 (/ + log(d/e)) + e~ 2 log Vj . 

Proof: The set Si C R D defined in item (a) of Assumption Set 1 has |Si| < 2(/ + l)Kj + 1. Furthermore, applying 
Lemma [5] to all at most (/ + l)Kj affine projectors yields a set of size at most (/ + l)Kj ((12/e) d + l) for item (b) of 
Assumption Set 1 . Lemma|6]together with Theorem[T]now finishes the proof. □ 

It is important to recall that Theorem[T]is proven by showing that a random matrix will (nearly) isometrically embed 
a given subset of R D into R'" with high probability. In the proof of Theorem|3]above, Theorem[T]is applied to embed 
a set which depends on the given x e R D we are ultimately interested in approximating (i.e., the set Si defined in 
Section l2~3l depends on x). Thus, Theorem[3]provides us with a high probability recovery guarantee for each separate 
x e R D on which we apply AlgorithmQ] 

4.2. Bounding the Number of Rows Required to Satisfy Assumption Set 2. We will begin this section by consider- 
ing item (b) of Assumption Set 2. Among other things, this will allow us to finally define the function Em '■ R D — » R + - 
However, we must first define the Restricted Isometry Property 1 12 1 on which the subsequent discussion relies. 

Definition 2. Let D, d e N, and e e (0, 1). AnmxD matrix M' has the Restricted Isometry Property, RIP(D,d,e), if 

(9) (1 - e) ||x|| 2 < ||M'x|| 2 < (1 + e) \\x\f 

for all x G R D containing at most d nonzero coordinates. 



We have the following lemma. 
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Lemma 7. Let e e ^0, There exists a finite set of vectors, Q C X = jj/ | y e R D contains d nonzero coordinates^ 
with |q| < (^) ^(12/e) rf + lj, swc/z f/zaf an mx D matrix M' has the RIP(D,d,e) whenever it embeds Q into R m with 
e/2-distortion. Furthermore, any such matrix M' will have ||M'i/|| 2 bounded above by 



E M ,(y) = VTTi. 



14+^14 



/or a«ye R D . 

Proof: To prove that M' has the RIP(D,d,e) we employ an argument similar to the proof of Theorem 5.2 in fl5)- To 
begin, we define ey, j e [D] - (0}, to be the the / h row of the D X D identity matrix. Then, for each d-element subset 
S = [ji, . . ., jd] c [D] - (0), we define Xs to be the d-dimensional subspace spanned by ej ir ..., ey. Next, we let Q' s 

be a minimal e/4-cover of the d-dimensional unit ball in X$ centered at 0, and define Qs = Q' S U {oj as per Lemma[5] 
Finally, we let 

Q~ |J Qs- 

Sc[D]-{0), \S\=d 

The upper bound on |Q| follows immediately. 

Now suppose that M' embeds Q into R m with e/2-distortion. Every x e R D containing at most d nonzero coordi- 
nates belongs to some subspace, X$, whose associated set, Q$ C Q, is also embedded into R m with e/2-distortion by 
M'. Hence, a trivial variant of Lemma[5]guarantees that every such x will satisfy Equation|9] Therefore, M' will have 
the RIP(D,d,e) as claimed. The equation for Em' now follows from Proposition 3.5 in [1361 . □ 



We are now sufficiently equipped to consider item (a) of Assumption Set 2 in Section 1231 We have the following 
lemma. 

Lemma 8. Fix e e ^0, |) and J e N — [max j/o/ log 2 ( reach(Al) ) — ^11' wnere jo an d C\ are defined as in Propertv \3a\ 
of Section [2] In addition, let Py, j € [/], fee a GMRA approximation to a given compact d-dimensional Riemannian 
manifold, M C R D , with d-dimensional volume V. Then, there exist absolute universal constants, C3, C4 € R + , which 
are independent of both M and its GMRA approximation, together with a finite set of vectors, B C R D , so that any 
mxD matrix M' which embeds B into R m with (C3 • e)-distortion will satisfy 

(l-e)||y-2j| 2 < ||M'y-M'z|| 2 < (1 + e) \\y - z\f 
for all y,z e M U {c h k \ j e Ul k e [Ky]} C R D . Furthermore, B C R D will have 

D \ C 4 d \ 

IS I = o\2 Ci}d V 2 ' 



e ■ min {1, reach {M)} ■ min{l,Ci} 



Proof: See Appendix lAl □ 

Furthermore, a modification of the proof of Lemma [8] yeilds our final lemma concerning Assumption Set 2 in 
Section l2~3l We have the following result regarding item (d) of Assumption Set 2. 

Lemma 9. Fix e e ^0, |j and } e N — [max j/o/ log 2 ( reach(Al) ) — ^}]' wnere jo an d C\ are defined as in Propertv \3a\ 
of Section [2] /n addition, let Py, j G [/], fee a GMRA approximation to a given compact d-dimensional Riemannian 
manifold, M C R D , wi//z d-dimensional volume V. Then, there exist absolute universal constants, C5, C(, € R + , which 
are independent of both At and its GMRA approximation, together with a finite set of vectors, B' C R D , so that any 
mX D matrix M' which embeds B' into R m with (C5 • e)-distortion will satisfy 

(l-e)||^-Py^(^)||-2^ < \\Mj-M'P jik (y)\\ ^ (l + e)||?-Pjjt(tf)||+2-> 

for all j e [/], k e [KA, and y 6 At Furthermore, B' c R D vw'/Z Ziave 



e • min {1, reach (At)} • min{l,Ci] 
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Proof: See Appendix iBl □ 



We are finally ready to provide a useful upper bound for the number of rows required in any measurement matrix 
satisfying Assumption Set 2 in Section |2~3| We have the following theorem. 

Theorem 4. Fix e e (o, |) and J e N — ^max j/o, log 2 ( te ach(M) ) ~~ ^}]» wnere jo an d C\ are defined as in Propertv \3a\ 
of Section^ In addition, let Py, j £ [/], be a GMRA approximation to a given compact d-dimensional Riemannian 
manifold, Ai C R D , with d-dimensional volume V. Then, there exists anmxD matrix, M, which satisfies Assumption 
Set 2 in Section \2~3\ with 

D 



m = 0\de log 



e ■ reach (Ai) 



+ de~ z J + e~ z log V 



and 



E M (y) = VTT7- 



Proof: Any m X D matrix which embeds B C 1R D from Lemma [8] into 3R' n with (C3 • e)-distortion will satisfy both 
items (a) and (b) of Assumption Set 2 in Section l2~3l (see Lemmas [7] and [SJ. Similarly, any given mxD matrix which 
embeds B' C R D from Lemma|9]into W with (C5 ■ e)-distortion will satisfy item (d) of Assumption Set 2. Finally, 
just as in the proof of Theorem [3] above, Lemma [5] applied to all at most (/ + l)Kj affine projectors yields a subset of 
1R D of size at most (/ + l)Kj ((12/e) d + l) for item (c) of Assumption Set 2. Theorem \T\ applied to the union of this 
subset with B U B' guarantees the existence of 

O (e- 2 log (\B\ + \B'\ + a + l)K, ((121 ef + l))) x D 

Johnson-Lindenstrauss embedding matrices which satisfy Assumption Set 2 with high probability. Applying Lem- 
mas|6][8] and|9]to bound Kj, [B\, and |B'|, respectively, now finishes the proof. □ 

In the proof of Theorem |4] above, Theorem[T]is applied to embed a set which only depends on the given manifold, 
Ai, and its GMRA approximation. More specifically, no knowledge was assumed regarding any point x e R D - 
Ai which we might be interested in approximating via Algorithm Q] Thus, Theorem H] provides us with a uniform 
approximation guarantee for all x e 1R D on which we might apply Algorithm[T] However, we pay several penalties for 
this uniformity. First, the number of rows in our measurement matrix, m, now depends on the extrinsic dimensionality, 
D, of the given manifold. Second, the resulting uniform error bounds are only nontrivial for input points, x, which are 
close to the given manifold. Hence, although Theorem|4]implies that Algorithm[T]enjoys a limited form of stability, it 
does not provide very robust uniform error guarantees in practice. 



5. Empirical Evaluation 

We implemented Algorithm [T] and present an empirical evaluation of the algorithm in this section]^ We consider 
the following examples: 

(i) Aii : 20, 000 points sampled from a "swiss roll", a 2-dimensional manifold S; 

(ii) Ai2'- 40, 000 points sampled from a unit 9-dimensional sphere S 9 ; 

(iii) AI3: 5,000 pictures of the digit '1' from the MNIST data base of images, 28 X 28 pixels, of handwritten 
digitfl with each picture having pixel intensity normalized to have unit L 2 norm. 

(iii) Aii'. 15, 000 points from the MNIST data base, with 5, 000 points sampled from each of the digits 1, 3, 5, with 
each picture having pixel intensity normalized to have unit L 2 norm. 

(iv) AI5: the Science News text document data set, which comprises 1163 text documents, modeled as vectors 
in 1153 dimensions, whose z'-th entry is the frequency of the z'-th word in a dictionary (see [19] for detailed 
information about this data set), normalized so that every document vector has unit Euclidean norm. 



All code is freely available at |http : / /www . math . duke . edu/ ~mauro| 
Available at |http : / / yann . lecun . com/ exdb/mnist/ . | 
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We construct the GMRA on these data sets in order to obtain the linear approximations, Py for each scale j considered 
below in the noiseless setting. 

For the noisy experiments we add Gaussian noise, N(0, %Id) where D is the (ambient) dimension of the data, 
to each data point for a = 0,0.05,0.1. We then use the noisy data to compute the GMRA approximations of the 
noisy data, as well as the random projections utilized by the proposed reconstruction algorithm J{. We consider the 
following measures of approximation: 

(10) re MSE JH,M,j 2 := - > — }^ Jn , relMSE, 2 := - > - !\ ' 

where {x;}" =1 are the data points, j is the level in the GMRA, ranging from to / (dependent on the data set), 3\ is 
the proposed Algorithm, and M is a fixed random (with respect to Haar measure) orthogonal projection with range 
of dimension (dj ■ m) A D, where the "oversampling factor" m — 1,2,4,16, and the "intrinsic dimension" dj = 
maxj; dim(range(Py j i : )). Therefore, dj is the dimension of the manifold (2 and 9, respectively) for M\ and yVb- The 
dimension parameter, dj, is adaptively chosen in a scale-dependent way for M3, Mi, M5 as described in [3], with 
actual values used in these examples reported in Figure [TJ 

There we also run SpaRSA ll48l (for reasonable choices of the several parameters involved), one of the leading 
algorithms, among many, for sparse reconstructions. We notice that: (a) for general real world data sets it achieves 
comparable precision to our algorithm, suggesting that the GMRA dictionaries may be used in the context of standard 
sparse approximation; (b) for low-dimensional manifold synthetic data sets, which do not curve in many dimensions, 
it achieves higher accuracy, since the directions of a few tangent planes are sufficient to span a subspace containing 
the whole manifold. 

Finally, in Figure|2]we report running times, for the same data sets as in FigureQ] for our algorithm J?I and SpaRSA. 
These graphs suggest that our algorithm can perform several orders of magnitude faster than SpaRSA. In the examples 
shown it took a few seconds to run Algorithm[TJon all the points, with SpaRSA taking a significant fraction of a second 
to run on a single point. 



6. Conclusion 

In this paper we discussed the ability of random projection to embed an intrinsically low d-dimensional submanifold 
of R D , together with a piecewise linear approximation to the submanifold, into p t ,°( dlo S d ) in a way which (approxi- 
mately) preserves the fidelity of the embedded piecewise linear approximation to the embedded manifold. Although 
any collection of approximating affine spaces suffice, we focussed on the type of multi-scale linear approximations 
provided by GMRA [3 1 in particular. It is worth mentioning that the entire Geometric Wavelet Transform (GWT) fl3] 
of a point near a given manifold can also be preserved by the type of random projections discussed herein. 

Note that the GWT of a point on a given manifold will always be approximated by the sum of at most Jd vectors 
(where / is the number of scales in the GWT). So, pessimistically, a random projection needs to preserve all distances 
in a number of 0(/d)-dimensional subspaces which is bounded above by Lemma|6]in order to approximately preserve 
the entire geometric wavelet transform of each point on the manifold. Thus, the GWT of each point on a given 
manifold should be preserved in compressed form by a random linear projection onto a subspace whose dimension, 
m, satisfies a variant of Theorem[3]with d replaced everywhere by Jd. 
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Appendix A. Proof of Lemma[8] 

To prove this lemma we will modify the proof of Theorem 3.1 in [JS). The proof of Theorem 3.1 proceeds in two 
steps. First, a finite set, B C R D , of points on/near the given manifold M is defined. The main body of the proof then 
consists of demonstrating that any mxD matrix, M', which embeds B into R'" with 0(e) -distortion will also satisfy 

(1 - e) ||x - y|| < ||M'f-M'y|| < (l + e)||x-y|| 

for all x,ye M. Our proof will proceed along a similar path. We will begin by first defining a modified version of the 
set, B, considered in [6|. We will call this set B. Then, we will prove that any m X D matrix which which embeds B 
into R m with 0(e) -distortion will also satisfy item (a) of Assumption Set 2 in Section l2~3l 

Let cLm y) denote the geodesic distance between x, y e M. Furthermore, let Tan^? denote the d-dimensional 
tangent space to M at each x e Finally, let 

8 M ,t>W = [$£M\d M (x*,#) < 5 

for each b e R + and x e M. 

We are now ready to construct B C R D as per |6| as follows: Set T = o(jj ■ minjl, reach and, for each 
x £ A\, let Q 2 (x) C Tanf denote a minimal (e ■ T/ Voj-cover of the rf-dimensional Euclidean ball of radius T 
centered at e Tanf . Next, choose A C M to be a minimal finite cover of M satisfying 

min dyvi X*) < T, 

for all xe M. Then, 

B:=|jHu(<r+Q 2 0?)). 

In the next paragraph we will define our modified set, B C R D , which is a superset of the set B defined above. 
Fix j e [/] and k e [Kj]. For each a e A above, let e &m,t be such that 

ll^jt - c jrk \\ < \\y-Cj, k \\ Vye& MJ (a). 

Let Aft = \aj,k \ -A]- Furthermore, denote the (d + l)-dimensional vector space spanned by Tan^ (J jc^jt - ctjjA by 
Tan^y /c , and then let (a) C Tan^y /c be a minimal (e ■ T/ Voj-cover of the (d + 1) -dimensional Euclidean ball of 
radius T centered at 0. To finish, define 



% := (J [aj,k\ U (a j:k + Q jjk (a)) 



SeA 



and then set 



B 



U B M u f c » 

\jeffl, ke[K,] 



ubuq, 



where Q C 1R D is as defined in Lemma [7] 
Note that B will be bounded above by 



(J+l)-Kj l+.max \Bj, k \\ + \B\ + \Q\. 

\ ;€[/], k€[Kj] ') 

Applying Lemma|7]to bound |Q|, Lemma|6]to bound Kj, and appealing to Section 3.2.5 of [6 1 to bound \B\, the previous 
line reveals that 

\0(d) i \ / n \°W 



(ID 



\B\ «: 2 0(/ ' d) • V • 



min{l,Ci} 



max LB.-jtl I + V 
je[J], ke[Kj] ' 



D 



e • min {1, reach / 



We now finish bounding the cardinality of B by noting that \BjjA will always be bounded above by the upper bounds 
for \B\ in Section 3.2.5 of [6 | after every occurrence of K = d is replaced with d + 10 The stated upper bound on |b| 
follows. 

We will now complete the second portion of our proof by demonstrating that a sufficiently precise linear embedding 
of B will satisfy item (a) of Assumption Set 2. First, since B C B, Theorem 3.1 in [6 | guarantees that a low-distortion 
embedding of B will preserve all pairwise distances between points on the manifold M. Furthermore, any embedding 
of B will also embed all c^-vectors since they form a proper subset of B. Hence, if suffices for us to show that a 
sufficiently precise linear embedding of B will (approximately) preserve the distance from each cy^-vector to all points 
on the manifold At. 

Fix ;' e [/], k e [KA, and x e M. Let a ' e A be the closest element of A to x, 

a' — arg min dj^ (a, x) . 

aeA 

Finally, let x'. k denote the projection of x onto the (d + l)-dimensional affine subspace a'. k + Tan^ , j k . By considering 
the Taylor series expansion of the unit speed parameterization of the geodesic path from a '. k to x on ISA, we find that 



= x'- k + r, where ||fj| = O 



reach (M) 



In fact, the magnitude of the remainder, r, is also O | v - a 

Furthermore, the definition of a '. k e M implies that x — a 

Continuing with the proof, suppose that anmxD matrix, M', embeds B into R m with 0(e)-distortion. A trivial 
variant of Lemma|5]then implies that 



j since T < reach (M) /2 (see Corollary 2.1 in (6)). 

= 0(||f-Cy,jt||). 



\\M'x-M'c jrk \\ < 



M'x-M'x 



M'x'. k -M'c j/k 



< \\M'i\\ + (1 + 0(e)) 







x' - 

x j,k 


Cj,k 



< (l+@(e))(||f-c ; -, fc || + ||rl|) + ||M'r|| < (1 + 0(e)) ||f - c jrk \\ + ||M'f|| + O 



x - a 



i,k 



since Qj ik (a') c Tseng, ^ is a proper subset of B, and {x '- k — Cj rk j e Tan^, ^. In addition, the fact that QcB together 
with Lemma|7]guarantees that M' will have the RIP(D,d,@(e)). This fact combined with the Holder inequality finally 
reveals that 



\\M'x-M'cj ik \\ < (1 +e(e))||j?-^|| + 



i,k 



< 1 + 0(e) + O 



■ T 



< (l + 0(e))||j?-^||. 

The lower bound for ||M'x - M'cyjt|| is established in an analogous fashion. We have the stated theorem. 



Intuitively, we are increasing the effective intrinsic dimensionality of M from d to d + 1 in the process of creating our B^-subsets. 
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Appendix B. Proof of Lemma[9] 

The proof of this Lemma borrows heavilly from the proof of Lemma[8] Set T = O^^Jp . min{l, reach (A1)}). 

We will begin by defining the set B' c R D . Let A c M, B c R D , and Ay,* = {fl/,(c | a e A} C M for each 
j e [J], k e [Kj] be defined as in AppendixlAlabove (except now using the smaller value of T from the second sentence 
of this appendix). Let Tan^ denote the (2d + l)-dimensional vector space spanned by 

Tan «v U - M U H^y I y e rD 1 

for each dyjt 6 Aj^. Furthermore, for each flyjt e Ayjt, let Q'^ (a) C Tan^yj. be a minimal © ■ T/ Voj-cover of the 
(2d + l)-dimensional Euclidean ball of radius T centered at e Tan^. To finish, define 



aeA 



for each j e [}], k e [Kj], and then set 



B' 



B' U\S 



UBUQ, 



U - 

\j€[J], k€[Kj] 

where Q C R D is as defined in Lemma|7] It is not difficult to see that \B'\ will be bounded above as per Equation [TT1 
after e is replaced everywhere by 2'e. Simplifying yields the stated upper bound. 

We will now complete our proof by demonstrating that a sufficiently precise linear embedding of B' will satisfy 
item (d) of Assumption Set 2. Fix j e [/], k E [Kj], and x 6 At Let a ' e A be the closest element of A to x, 

a' = arg min dj^ (a, x) . 



leA 



Finally, let x'. k denote the projection of x onto the (2d + l)-dimensional affine subspace a'. k + Tan^y k . By considering 
the Taylor series expansion of the unit speed parameterization of the geodesic path from a '. k to x on ISA, we find that 



= x'j k + r, where ||rj| = O 



reach (M) 



x - a 



it 



Furthermore, we recall that the magnitude of the remainder, r, is also O ^ 

To finish, suppose that anmxD matrix, M', embeds B' into W with ©(e)-distortion. A trivial variant of Lemma|5] 
implies that 



I since T is sufficiently small. 



j|M'x-M'Py,fc(£)|| < 



M'x-M'x 



< (1 +@(e))(||f- Py /Jt (f)|| + |fl|) + \\M'r\ 

< (1 + 0(e)) || j? - TPj, k (x)\\ + \\M'i\\ + o( 



M'x' jk -M'P j/k (x)j < \\M'r\\ + (1 + 0(e)) 

) 



since Q'. k (a') c Tan^ i s a subset of B', and \oc' k - P e Tan^-j.. In addition, the fact that Q c B' together 
with Lemma|7]guarantees that M' will have the RIP(D,d,0(e)). This fact combined with the Holder inequality reveals 
that 



x — a 



< (l + 0(e))||x-P ///c (x)|| +0 



\\M'x-M'Vj, k (x)\\ < (l + 0(e))||x-P M (x)|| + O 

< (l + 0(e))||x-Py /i (x)||+2- / 

whenever T is weighted by a sufficiently small (universal) constant. The lower bound for ||M'x - M'Py^ (x)|| is 
established in an analogous fashion. 
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